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Abstract 

Information causality has been proposed to constrain the maximal mutual information shared 
between sender and receiver in a communication protocol based on physical theories such as quan- 
tum mechanics. In this paper, we test this proposal for the more general quantum communication 
protocols with multi-level and (non-)symmetric channels by directly evaluating the mutual infor- 
mation. We utilize the random-access code and no-signaling boxes to formulate the Bell-type 
inequalities and semidefmite programming to find the generalized Tsireslon bound. Our results 
support the information causality which is never violated for the more general settings discussed in 
this work. For the 2- level and 2-setting cases, we also find that the information causality is satu- 
rated not for the channels with the maximal quantum non-locality associated with the Tsirelson's 
inequality but for the marginal cases saturating the Bell's inequality. This indicates that the more 
quantum non-locality may not always yield the more mutual information. 
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I. INTRODUCTION 

The advantage of quantum information over the classical one in the computing and com- 
munication has been well exploited in the past decades. The abstract form of this advantage 
in the communication process is termed as communication complexity [1]. Despite of the 
seemingly non-local feature of quantum entanglement employed in the quantum communi- 
cation process, its communication complexity is bounded and not maximal 1 . Recently, the 
bound is formulated for general physical theories and is termed as information causality [4] , 
which states that the maximal mutual information shared between the sender and receiver 
in a communication protocol with the resources based on physical theories cannot exceed the 
amount of classical communication. The criterion of information causality selects a subset 
of non-signaling theories, including quantum mechanics. 

If the information causality holds for all the realistic communication processes, it can 
be erected as a new physical principle to formulate the fundamental theories from the in- 
formation theoretical point of view. Moreover, one may wonder if the quantum mechanics 
is equivalent to the theories saturating the non-local bound given by information causality. 
Or, the quantum mechanics cannot saturate the information causality. However, most of the 
tests on the above questions are performed only for two-level quantum communication pro- 
tocols. Especially, in [5] it was shown that the non-local bounds from information causality 
exactly coincide with the generalized Tsirelson inequalities for a particular set of two-level 
quantum protocols (of multi-settings). Related tests based on information causality and 
macroscopic locality 2 [6] was done in [7]. 

1 In [2] , by sharing a PR-box [3] , any distributed decision problem can be solved with perfect success with 
only one bit communication. This mens the communication complexity is trivial. Therefore, the communi- 
cation complexity is related to the non-locality. Thus, in this paper we use the terms "the communication 

complexity" and "the non-locality" interchangeably when we discuss the non-local correlations. 

2 Macroscopic locality states that any physical theory should recover classical physics in the continuum 
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limit, i.e., the probability distributions of large number particles should satisfy the Bell's inequalities. 



It is then interesting to test the information causality over quantum mechanics for more 
general quantum communication protocols, e.g. multi-level ones 3 . In this work, we will make 
efforts along this direction, namely, we will try to find the maximal bound of the mutual 
information shared between sender and receiver for multi-level quantum communication 
protocols, and compare with the bound from information causality. 

In our communication protocol, Alice has a database of k elements, denoted by the vector 
a = (do, ax, , , afc_i). Each element Oj is a d-level random variable and is only known by Alice. 
A second distant party, Bob is given a random variable b G 0, 1, 2, , , k — 1. The value of b 
is used to instruct Bob to optimally guess the d-level bit (d-bit) after receiving a d-bit a 
sent from Alice via the pre-shared correlation between Alice and Bob. In this context, the 

information causality can be formulated as follows: 

fc-i 

/ = £Vto p\b = i) <\o g2 d. (i.i) 

i=0 

where I(af,P\b = i) is Shannon's mutual information between a« and Bob's guess d-bit (5 
under the condition b = i. The bound is the amount of the classical communication encoded 
in a. 

To be specific, we use the random access code (RAC) to encode Alice's data base a into 




FIG. 1. The NS-box and the channel 
3 In this paper, using the non-uniform Pr(ai) or the more general communication channels such as the 
asymmetric and anisotropic channels is belonged to the more general communication protocols. 
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x :— (xi, • • • , Xk-i) with Xi = <2j — ao, and Bob's input b into y := (yi, • • • , yu-i) with = 5^ 
for 6 7^ and y = for b = 0. Besides, the communication protocol is also supplemented 
by the pre-shared correlation, called the non-signaling box (NS-box). The above x and y 
are the inputs of the NS-box, and their corresponding outcomes are denoted by Ag and Bg, 
respectively. Therefore, the NS-box and thus the communication protocol is characterized 
by the conditional joint probabilities Pr(Aj, Bg\x, y) satisfying the following non-signaling 
conditions: 

Pr(^ B^x, y) = Pr(A £ \x) and ^ Pr(A s , B$\x, y) = Pr(^|f), Vf , y. (1.2) 

This implies that superluminal signaling is impossible. 

Here, we shall mention that the mutual information in (1.1) is referred to the channel 
characterized by the conditional probability Pr(/3|aj, b — i) with RAC decoding (3 = a + Bg, 
which relate the outcome (5 of the channel to its input a« and b. Note that the channel 
probability Pr(/3|aj,6 = i) cannot be completely determined only by the conditional joint 
probability Pr(A^, B$\x, y) but also by the input marginal probability Pr(a«). In a sense, the 
flow of the NS-box is perpendicular to the flow of the channel, this is schematically shown in 
Fig 1 . We will see that the difference of these two flows will be relevant in how to maximize 
the mutual information / in (1.1). 

Naively, one will formulate the whole problem as maximizing the mutual information / 
of the protocol characterized by Alice's input marginal probabilities Pr(a«) and the channel 
derived from the quantum correlations. To proceed, we have to make sure the whole prob- 
lem can be formulated as a convex optimization problem [23, 24] so that some numerical 
recipes such as [20] can be utilized for maximizing /. However, we will show that this is 
not a convex problem if we would like to maximize / by varying over the input marginal 
probability Pr(aj) and the joint probability Pr(A^, B$\x, y) of NS-box 4 . Therefore, in this 
way the numerical recipe [20] cannot be applied to finding the mutual information bound 
for information causality. Observe that our problem here is different from the usual way of 
determining the channel capacity. The usual way of finding the channel capacity for a given 
channel, i.e., fixing Pr(f3\di,b = i), is to maximize the mutual information / over the input 
4 In fact, the mutual information / defined in (1.1) depends on Pr(aj) and only Pr(Bjj — Ag\x,y) not the 

full Pr(Aj, Bg\x, y). Thus, later when referring to the joint probability of NS-box, it will in fact mean 

Pr(£?£ — A$\x, y) though we may use the expression ~Pv{Ag, Bjj\x, y). 
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marginal probability Pr(aj). This is the convex optimization problem and can be solved 
numerically by the recipe such as [20]. 

To by-pass this no-go situation, a naive way is to maximize / over Pr(aj) and Pi(Ag, B$\x, y) 
by brutal force numerically without relying on convex optimization. The viability is, how- 
ever, restricted by the computational power and cannot work for complicated protocols, like 
the ones with multi-level and multi-setting. Despite that, we will try some simpler cases 
and verify that the bound by information causality may not be the same as the generalized 
Tsireslon bound associated with the quantum non-locality. 

The other way of by-passing the no-go is to consider some special cases such that the gen- 
eralized Tsireslon bound agrees with the bound for information causality. With the suitable 
object for quantum communication complexity, e.g. the CHSH function in Bell inequality, 
maximizing the object in determining the optimal quantum channel is a convex optimization 
problem. Thus, we need to find the special conditions such that the mutual information / 
is monotonically increasing with the object for quantum communication complexity. There- 
fore, we can then by-pass the no-go and find the bound on information causality for quantum 
channels. For such cases, maximizing mutual information over quantum channel is the same 
as finding the generalized Tsireslon bound. This is also the way adopted in this work. 

Before the proof, one important issue is to find the appropriate object of communica- 
tion complexity which is called Bell-type function in [17-19]. For the two-level protocols, 
the CHSH correlation function gives the natural object for the communication complex- 
ity. Moreover the Tsirelson theorem helps to yield the Tsirelson inequalities to constraint 
Pr(A^, Bg\x, y) of the NS-box. However, for the multi-level protocols, there is neither analogy 
CHSH correlation as a natural object for the communication complexity, nor the Tsirelson- 
type inequalities to yield suitable Bell-type inequalities for quantum constraint. Despite 
that, one found an alternative way to derive the Tsirelson inequalities for two- level proto- 
cols in [5] , and it will be generalized for the multi-level protocols in this paper. This is based 
on the signal decay theorem [10, 11]. Consider a cascade of two communication channels: 
X <^-> Y > Z with the second channel being binary and symmetric with a noise parameter 



£. Then, 



I(X;Z) 
I(X;Y) 
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Applying this to our protocol of RAC and NS-box, one can arrive 

I(a t -P\b = i)<g (1.4) 

where £j is the coding noise parameter and can be expressed in terms of the joint probabil- 
ity Pr(Ag, Bg\x, y) and the input marginal probability Pr(dj). Assuming independent and 
identically distributed (i.i.d.) for a^'s, we can then sum over i to obtain bound on /. Using 
the Cauchy inequality, one can linearize the above quadratic inequality to arrive 

£fc (1-5) 

i 

as the object for communication complexity, Furthermore, assuming uniform probabilities 
for all Alice's database Oj's, we showed in [5] that the information causality can be formulated 
as I — "^A;, and is exactly equivalent to the Tsirelson-type inequalities for two-level 

protocols. Recently, similar considerations can also be found in [8, 9]. 

Instead of deriving as a mathematical theorem, the above derivation of (1.5) for Tsirelson- 
type inequalities is based on physical content of the communication protocols, and can be 
generalized to the multi-level cases. This is one of the main tasks of this paper. To by- 
pass the no-go as described, we only consider the i.i.d. inputs {<2j} with uniform Pr(dj), in 
the symmetric and isotropic channels. The symmetric channel is defined as usual, and the 
isotropic channel means that the noise parameter for NS-box is uniform, i.e., & is independent 
of i. Based on these assumptions, we can show that / and (1.5) are monotonically related 
so that maximizing / is equivalent to maximizing (1.5). One can then use the standard 
semidefmite programming (SDP) [23, 24] algorithm which also is a linear convex optimization 
programming to build up the quantum channel. Based on these quantum channels with 
uniform Pr(dj) one can further calculate the mutual information /. This is schematically 
shown in Fig 2. In this way, we can test the information causality over the the more generic 
multi-level quantum communication protocols. 

The above discussions are for multi-level symmetric channels. Although we can transfer 
the maximizing mutual information / to the convex optimization problem, it needs many 
assumptions. If we would like to consider the asymmetric and anisotropic channels with 
possibly the non-uniform Pr(aj), how can we do? Instead of using convex optimization, we 
are forced to use brutal force method which we mentioned before. That is, we have to pick 
up all the sets of quantum correlation Pr(Ag, Bp\x, y) and input marginal probability Pr(aj) 
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and then evaluate the corresponding mutual information I. We compare all of them to find 
the maximal one. In this way, we find that the boundaries for the information causality and 
the quantum non-locality may not agree. That means, for general communication channels, 
maximizing mutual information I over quantum channels is not equivalent to finding the 
generalized Tsireslon bound. However, due to the demanding computational resources in 
numerical optimization, we will only consider the two-level and two-setting case. 

The paper is organized as follows. In the next section we will define our communication 
protocols based on RAC and NS-box and then derive the objects for the communication 
complexity of symmetric quantum channels with i.i.d. and uniform input marginal proba- 
bilities. In section III, we will show that maximizing the mutual information / over joint 
probabilities Pr(Ag, Bg\x, y) and input marginal probabilities Pr(aj) is not a convex opti- 
mization problem and also prove that (1.5) and the mutual information / are monotonically 
related. In section IV, we will briefly review the SDP algorithm proposed in [18, 19] and then 



r 





FIG. 2. Scheme for maximizing the mutual information I over quantum channels. 



8 



apply it to maximize (1.5) numerically for the multi-level symmetric channels with i.i.d. and 
uniform input marginal probabilities. Our numerical results for the maximal quantum non 
locality will be used to evaluate the corresponding mutual information I and compare to the 
information causality. In V, we will use brutal force to maximize the mutual information / 
for more general communication channels. Finally, in section VI we will conclude our paper 
with some discussions. 

In appendix A, we briefly review the signal decay theorem for binary channels mentioned 
in [10, 11] and then generate it to multi-nary channels. In appendix B, we give a proof that 
mutual information / is not a concave function to joint probabilities and input marginal 
probabilities. In appendix C, the standard primal and dual problem for SDP are defined. 
We can rewrite our problem as the standard form and use numerical recipes to solve it. In 
appendix D, we write down the quantum constraints for the first and second step of the 
convex optimization programming. We also estimate the number of quantum constraints 
and explain how to write down these constraints efficiently 



II. MULTI-LEVEL BELL-TYPE INEQUALITY FROM SIGNAL DECAY THEO- 
REM 

In the Introduction, we have briefly describe our communication protocol for two distant 
partite Alice and Bob: Given one encoded d-bit a by Alice and one random number b, Bob 
needs to optimally guess in Alice's database a := (do, Qfe-i). In this task, Bob can use 
the pre-shared correlations simulated by NS-box, whose inputs are the Alice's encoded d- 
bit-string x and Bob's y as mentioned previously. The corresponding outputs of the NS-box 
are Ag and Bp, respectively. More specifically, the d-bit sent by Alice is a = Ag — a , and 
the pre-shared correlation is defined by the conditional probability Pr(Bp — Ag — x ■ y\x,y) 
between the inputs and outputs of the NS-box. Accordingly, Bob's optimal guessing d-bit 
(3 can be chosen as Bp — a. This is because (3 = Bp — Ag + clq = x ■ y + clq as long as 
Bp Ag — x ■ y holds. In this case, Bob guesses a b perfectly. Take d = 3 and k = 3 as an 
example for illustration: Bob's optimal guess bit is 

(3 = x ■ y + a = (ai - a , a 2 - a ) ■ (y , yi) + a . (2.1) 
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If Bob's input y = (y ,2/i) = (0,0), (3 = a ; if y = (y ,yi) = (1,0), (3 = en; and if 
V — (UojUi) — (0; 1); P — °2- Bob can guess a& perfectly. 

Using the above communication protocol, Alice and Bob have d^ 1 and k measurement 
settings respectively, and each of the measurement settings will give d kinds of outcomes. 
However, the noise of the NS-box affects the communication complexity so that Bob can 
not always guess the d-bit a& correctly with the pre-shared correlation. If the NS-box is a 
quantum mechanical one, then the conditional probability Pr(_B^— A$ = x-y\x, y\) should be 
constrained by the quantum non- locality, so is the joint probability Pr(Aj, B$\x, y). Then the 
question is how? For d — 2 and k = 2, the quantum constraint comes from the well-known 
Tsirelson inequality. That is, the CHSH correlation function Cg$, which can be expressed in 
terms of joint probability as Pr(00|a 7 , y) — Pr(01|x, y) — Pr(10|a 7 , y) + Pr(ll|x, y) for uniform 
output marginal probabilities, is the object for quantum communication complexity bounded 
by 2\/2. This is the constraint for Pr(A^, Bg\x, y) to be consistent with quantum mechanics. 

However, there is no known Tsirelson inequality for the cases with d > 2. Despite that, 
in [5], we find a systematic way to construct d = 2 multi-setting Tsirelson inequalities by 
the signal decay theorem [10, 11]. We will generalize this method to d > 3 case to yield 
suitable objects for quantum communication complexity. To proceed, we first recapitulate 
the derivation for d — 2 cases. 

Signal decay theory tells the loss of mutual information when processing the data through 
a noisy channel. Consider a cascade of two communication channels: X > Y Z, then 
intuitively we have 

I(X;Z)<I(X;Y). (2.2) 
Moreover, if the second channel is a binary symmetric one, i.e., 



' S7 2" 

|(i-0 1(1 + 



Pr(Z\Y) 

then the signal decay theorem says 

/(X; z) < e. (2.3) 



I(X;Y) 

This theorem has been proven as the tight bound in [10, 11]. Note the equality is held only 
when propagating the weak signal for noisy the channel Pr(Z|Y). i.e., Pr(F|X = 0) and 
Pr(F|X = 1) are almost indistinguishable. For more detail, please see appendix A. 
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In [5], we denned X = Oj, Y = ao + x ■ y and Z — j3. On purpose, the bit Oj is encoded 
as a + x ■ y such that I(a,i] a + x ■ y) = 1. Using the tight bound of (2.3), in this case we 
can get 

I(a i; P\b = i)<g. (2.4) 

For our communication protocol, the noise parameter is denoted as and can be related 
to Alice's input marginal and the joint probability of the NS-box as follows: 

= Pr(£) Pr (Bg -A 3 = x- y\x, y ) . (2.5) 

{x} 

Assuming that Alice's database is i.i.d., we can then sum over mutual information between 
j3 and a, to arrive 

X)/(oi;/3|6 = 0<E^ ( 2 - 6 ) 

i i 

Though this object is quadratic, we can linearize it by the Cauchy-Schwarz inequality, i.e., 
| — V k Si C 2 - P° r d = k = 2 case with uniform input marginal probabilities Pr(cij), 

it is easy to show that < \f2 (or ^ 1) * s nothing but the conventional Tsirelson 

inequality. Moreover, in [5] we use the SDP algorithm in [17] to generalize to d = 2 and 
k > 2 cases and show that the corresponding Tsireslon's inequalities are 

£fc<V*. (2.7) 

i 

This is equivalent to say Sj£ 2 < 1. From the signal decay theorem (2.4) we find that it 
implies the quantum communication complexity is consistent with the information causality 
(1.1). 

We now generalize the above construction to d > 2 cases. First, we start with d = 3 
case by considering a cascade of two channels X > Y Z with the second one a 3-input, 
3-output symmetric channel. Again, we want to find the upper bound of j^x'-y) • I 11 ^ e 
Appendix A we show that the ratio saturates the upper bound whenever three conditional 
probabilities Pr(F|X = i) with % = 0,1,2 are almost indistinguishable. Moreover, it can 
be also shown that the upper bound of the ratio is again given by (2.3) for the symmetric 
channel between Y and Z specified by 



Pt(Z\Y) = 



( 2£+i H h\ 

3 3 3 

1-g 2|+1 \-(, 
3 3 3 

W ]-| 2^+1 
3 3 3 
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(2.8) 



One can generalize the above to the higher d cases for the symmetric channel between 
Y and Z specified as follows: Pr(Z = i\Y = i) = and Pr(Z = s ^ i\Y = i) = ^ 

with i e {0, 1, d — 1}. Again we will arrive (2.3). Based on the signal decay theorem with 
X := aj, Y := + x ■ y and Z :— (3 and assuming that Alice's input probabilities are i.i.d., 
we can sum over all the mutual information between each ai and j3, 



In our communication protocol, the noise parameter £j is denoted as ^ and can be 



As for the d = 2 case, we assume the upper bound of (2.9) is capped by the informa- 
tion causality to yield a quadratic constraint on the noise parameters. Again, using the 
Cauchy-Schwarz inequality to linearize the quadratic constraint, we find that the general- 
ized inequality Y^y^y — V^- This inequality could be the Tsirelson-type inequality and it 
need to be checked. Especially, if the input marginal probabilities Pr(aj) are uniform, the 
bound on the object X^y^j? yields a constraint on Pr(Ag, B$\x, y). Therefore, we obtain a 
proper object of characterizing non-locality: J2y£,y with uniform Pr(aj). 

Then, it is ready to ask the question: does the joint probabilities Pr(A^, B$\x, y) giving 
the maximal quantum non-locality saturate the upper bound of information causality? Next, 
we are going to answer this question. 

III. CONVEXITY AND MUTUAL INFORMATION 

1. Feasibility for maximizing mutual information by convex optimization? 

In order to test information causality for difference quantum communication protocols, we 
have to maximize mutual information / over quantum channel and Alice's input probability. 
One way is to formulate the problem as the convex optimization programming, so that we 
may exploit some numerical recipes such as [20] to carry out the task. 

Minimizing a function with the equality or inequality constraints is called convex opti- 
mization. The object function could be linear or non-linear. For example, SDP is a kind of 
convex optimization with a linear object function. Regardless of linear or non-linear object 




(2.9) 



expressed as 



d Y.x Pr (^) Pr (^y - A x = x ■ y\x, y) - 1 
d-l 



(2.10) 
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functions, the minimization (maximization) problem requires them to be convex (concave). 
Thus, if we define the mutual information / as the object function for maximization in the 
context of information causality, we have to check the concavity of it. 

A concave function f(x) (/ : lR n — > R) should satisfy the following condition: 

f(\ Xl + (1 - X)x 2 ) > \f( Xl ) + (1 - X)f(x 2 ), (3.1) 

where x\ and x 2 are n-dimensional real vectors, and < A < 1. 

Mutual information between input X and output Z can be written as 



I(X; Z) = H(Z) - H(Z\X) = H(Z) - ^ Pr(X = i)H(Z\X = i), (3.2) 

i 



where H(Z) = — J2i Pr(Z = i) log 2 Pr(Z = i) is the entropy function. We will study 
the convexity of I(X; Z) by varying over the marginal probability Pr(X) and the channel 
probability Pi(Z\X). 

The following theorem is mentioned in [21]. If we fix the channel probability Pi(Z\X) in 
(3.2), then I(X; Z) is a concave function with respect to Pr(X). This is the usual way in 
obtaining the channel capacity i.e., maximizing mutual information / over input marginal 
probability for a fixed channel. 

However, in the context of information causality, the channel probability is related to 
both the joint probability of the NS-box and the input marginal probability. This means 
that the above twos will be correlated if we fix the channel probability. This cannot fit to 
our setup in which we aim to maximize the mutual information / by varying over the joint 
probability of NS-box and the input marginal probability. For example, in d — 2 and k — 2 
case, the channel probability is given by 



Pr((3\a t ,b = i) 



Oii 



Oii 



1 — A,; A ? ; 



13 



where 



a ■= Pr(/3 = 0|a = 0, b = 0)= ^ Pv(B y - A x = 0\x = £, y = 0) Pr(ai = £), (3.3) 

e=o 
i 

A := Pr(/3 = l| ao = 1, 6 = 0)= Pr( J B J/ - ^ = 0\x = £, y = 0) Pr(ai = 1-1), (3.4) 

e=o 
i 

ai : = Pr(/3 = 0|ai = 0, b = 1)= Pr( J B s/ - A x = i\x = £, y = 1) Pr(a = £), (3.5) 

i 

A x := Pr(/3 = l|ai = 1, 6 = 1)= Pr( J B J/ - A x = £\x = £, y = 1) Pr(a = 1 - £). (3.6) 



From the above, we see that the channel probability Pr(/3|dj, b = i) cannot be fixed by varying 
over Pic(B y — A x \x,y) and Pr(aj) independently. Similarly, for higher d and k protocols, we 
will also have the constraints between the above three probabilities. Thus, maximizing the 
mutual information / by varying the NS-box in the context of information causality is quite 
different from the usual way of finding the channel capacity. 

To achieve the goal of maximizing the mutual information / over the NS-box, we should 
check if it is a convex (or concave) optimization problem or not. If it is yes, then we can 
adopt the numerical recipe as [20] to carry out the task. Otherwise, we can either impose 
more constraints for our problem or just do it by brutal force. It is known that [22] one 
can check if maximizing function f(yi, ■ • ■ ,y n ) over y^s is a concave problem or not by 
examining its Hessian matrix 



H(f) 



d 2 f 



( 

dyf dyiy 2 
d 2 f 8^1 
dy 2 yi dy\ 



\ 



d 2 f d 2 f 



d 2 f \ 

9yiy n 

d 2 f 
dyiy-n 



dyl / 



(3.7) 



y n yi 9y n y2 

For the maximization to be a concave problem, the Hessian matrix should be negative 
semidefinite. That is, all the odd order principal minors of H(f) should be negative and all 
the even order ones should be positive. Note that each first-order principal minor of H(f) 
is just the second derivative of /, i.e. So, the problem cannot be concave if > for 
some i. 

With the above criterion, we can now show that the problem of maximizing / over 
Pr(£>^ — A^\x, y) and Pr(aj) cannot be a concave problem. To do this, we rewrite the mutual 
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information / defined in (1.1) as following: 

I = YYY Pr(/3 = n,a t = j\b = i) log 2 Jj^ = * Tj j } b f = %) (3.8) 
yi ' 1 Jl J B2 p r (/3 = n \b = i) Pr(a; = j) 



i=0 n=0 j=0 



Furthermore, one can express the above in terms of Pr(I% — Ag\x,y) and Pr(aj) by the 
following relations 

Pr(/9 = n,Oi = j\b = i) = Pr(% - A 3 = n - a \x, y) Pr(oi = j) H m Pr(a fc ), (3.9) 

d— 1 

Pr(/3 = n |6 = i) = J^Pr(/3 = n, Oj =j|6 = i), (3.10) 

3=0 

where x and y in the above are given by the encoding of the RAC protocol, namely, x := 
(xi, ■ ■ ■ ,x k -i) with Xi = at - a and y := (yi, ■ ■ ■ ,yk-i) with y { = 5 h ^ for b ^ and y = 
for 6 = 0. 

Moreover, both ~Pi(Bg— A^\x, y) and Pr(aj) are subjected to the normalization conditions 
of total probability. Thus we need to solve these conditions such that the mutual information 
I is expressed as the function of independent probabilities. After that, we can evaluate the 
corresponding Hessian matrix to examine if the maximization of / over these probabilities 
is a concave problem or not. 

For illustration, we first consider the d = 2 and k = 2 case. By using the relations (3.9) 
and the normalization conditions of total probability to implement the chain-rule while 
taking derivative, we arrive 

ln2-<9 2 / 

d(Pr(B y -A x = 0\x = 0,y = 0)y ~ 

+ — — — l — — )(Pr(a = 0)Pr(ai = 0) - Pr(a = 1) Pr(ai = l)) 2 



v Pr(/3 = 0|6 = 0) Pr(/3 = l\b = 0)' 
+ (Pr(a = 0)Pr(a 1 = 0)) 2/ ' 



Pr(/3 = 0, a = 0|6 = 0) Pr(/3 = 1, a = 0|6 = 0) ' 
+(Pr(ao = DP r(ai = 1))2( ___L__ + (3.1!) 

Obviously, (3.11) cannot always be negative. This can be seen easily if we set Pr(ao) = 
1 — Pr(ai) so that the first term on the RHS of (3.11) is zero. Then, the remaining terms are 
non-negative definiteness. This then indicates that maximizing / over the joint probability 
is not a concave problem. 
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The check for the higher d and k cases can be done similarly, and the details can be found 
in the Appendix B. Again, we can set all the Pr(eij) to be uniform so that we have 

d 2k ln2-d 2 I 

d{Yi{B g - ~A S = Ojf = 0, y = 0)) 2 " 

d-i l x 

^Pr(a = n, (3 = n\b = 0) + Pr(a = n, (3 = n + 1 - d\b = 0) ^ > °' ^ 3 ' 12 ^ 

2. Convex optimization for symmetric and isotropic channels with i.i.d. and uni- 
form input marginal probabilities 

Recall that we would like to check if the boundaries of the information causality and the 
quantum non-locality agree or not. To achieve this, we may either maximizing the mutual 
information / with the quantum constraint, or maximizing the quantum non-locality and 
then evaluate the corresponding mutual information / which can be compared with the 
bound of information causality. These two tasks are not equivalent but complementary. 
However, unlike the first task, the second task will be concave problem as known in [17, 19]. 
The only question in this case is if the corresponding mutual information I is monotonically 
related to the quantum non-locality or not. If yes, then maximizing quantum non-locality is 
equivalent to maximizing the mutual information I. The answer is partially yes as we will 
show because this monotonic relation holds only for symmetric and isotropic channels with 
i.i.d. and uniform input marginal probabilities. 

Assuming Alice's input is i.i.d., we have H(j3\b = i) = log 2 (rf). Also, once the channel is 
symmetric, we have Pr(/3 = t\di — j,b — i) = ( ^ d ~ 1 ^ +1 for t = j, and Pr(/3 = t\ai — j,b — 
i) = for t 7^ j. Thus, the mutual information I becomes 

/ = k\og 2 d + 2J log 2 ( ) + (1 ) log 2 (— — )](.3.13) 

i=0 

If we also assume the channels are isotropic i.e., ^ = £, then for such a case the mutual 
information / can be further simplified to 

/ = A;[log 2 d + log 2 ( ) + (1 ) l g 2 (-^-)]. (3.14) 

The value of £ is in the interval [0, 1] with £ = for the completely random channel, and 
£ = 1 for the noiseless one. 
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We can show that the mutual information / is the monotonic increasing function of the 
quantum non-locality parameterized by the noise parameter £. To do this, we calculate the 
first and second derivative of I with respect to £ and obtain 

dl d-l, (d- 1)6 + 1 
d 2 I _ d - 1 d-l 1 

de ~ ~ { (d-i)t + i + r^p- 

From the above, we see that ^ is always positive for £ e [0, 1]. Moreover, it is easy to 
see that / is minimal at £ = since = d — 1 > 0. Thus, the mutual information I is 
a monotonically increasing function of £ for the symmetric and isotropic quantum channel 
with i.i.d. and uniform input marginal probabilities. 

IV. FINDING THE BOUND OF BELL-TYPE INEQUALITY FROM THE HIER- 
ARCHICAL SEMIDEFINITE PROGRAMMING 

We now will prepare for numerically evaluating the maximum of (1.5) which is monotonic 
increasing with mutual information I under some assumptions. In order to ensure that the 
maximum of (1.5) can be obtained by quantum resource, we have to use the same method 
as in [18, 19]. In [18, 19], they checked if a given set of probabilities can be reproduced 
from quantum mechanics or not. The test can be formulated as a hierarchy of semi definite 
programming (SDP). This is a very important issue in quantum information. With the test, 
we can know the limitation for transmitting quantum information and the non-local nature 
of the quantum correlation. 

1. Projection operators with quantum behaviors 

We will briefly review the basic ideas in [18, 19] and then explain how to use it for 
our program. In [18, 19] they use the projection operators under following measurement 
scenario. Two distant partite Alice and Bob share a NS-box. Alice and Bob input X and Y 
to the NS-box, respectively, and obtain the corresponding outcomes a e A and b e B. Here 
A and B are used to denote the set of all possible Alice's and Bob's measurement outcomes, 
respectively. We use X(a) and Y(b) to denote corresponding inputs. These outcomes can be 
associated with some sets of projection operators {E a : a E A} and {E b : b e B}. The joint 
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probability of the NS-box can then be determined by the quantum state p of the NS-box 
and the projection operators as following: 

Pr(a,b)=Tr(E a E h p). (4.1) 

Note that Pr(a, b) is the abbreviation of Pr(Aj, Bp\x, y) = Tv(E As E B ^p) defined in the 
previous sections. 

If E a and E b are the genuine quantum operators, then they shall satisfy (i) hermiticity: 
El = E b and E\ = E b ; (ii) orthogonality: E a E a , = 5 aa > if X(a) = X(a') and E h E h , = 5 b:b > 
if Y(b) = Y(b'); (iii) completeness: T, aeX E a = I and Ti beY E b = I; and (iv) commutativity: 
[E a ,E b ] = 0. 

In our measurement scenario, the distant partite Alice and Bob perform local measure- 
ments so that property (iv) holds. On the other hand, the property (iii) implies no-signaling 
as it leads to (1.2) via (4.1). Furthermore, this property also implies that there is redun- 
dancy in specifying Alice's operators E a J s with the same input since one of them can be 
expressed by the others. Thus, we can eliminate one of the outcomes per setting and denote 
the corresponding sets of the remaining outcomes for the input X by A x (or B Y for Bob's 
outcomes with input Y). The collection of such measurement outcomes J2 X A x is denoted 
as A. Similarly, we denote the collection of Bob's independent outcomes as B. 

Using the reduced set of projection operators {E a : a G A} and {E b : b G B}, we can 
construct a set of operators = {Oi, O2, Oi, ...}. Here Oi is some linear function of 
products of operators in {I U {E a : a G A} U {E b : b G B}}. The set O is characterized by a 
matrix T given by 

r - = Tr(0\O jP ). (4.2) 
By construction, T is non- negative definite, i.e., 

r y 0. (4.3) 

This can be easily proved as follows. For any vector v G C n (assuming r is a n by n matrix), 
one can have 

v*Tv = Z s , t v* s Tr(OiOtp)v t = Tr(V*Vp) > 0. (4.4) 

Recall that our goal is to judge if a given set of joint probabilities such as (4.1) can be 
reproduced by quantum mechanics or not. In this prescription, the joint probabilities is 
then encoded in the matrix T satisfying the quantum constraints (4.1) and (4.3). However, 
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r contains more information than just joint probabilities (4.1). For examples, the terms 
appearing in the elements of T such as Tv(E a E a ip), Tr(E b E b >p) for X(a) ^ X(a') and Y(b) ^ 
Y(b') can not be expressed in terms of the joint probabilities of the NS-box. This is because 
these measurements are performed on the same partite (either Alice or Bob) and are not 
commutative. Therefore, to relate the joint probabilities of the NS-box to the matrix T, 
we need to find the proper combinations of T^- so that the final object can be expressed 
in terms of only the joint probabilities. Therefore, given the joint probabilities, there shall 
exist some matrix functions F 9 's such that the matrix T is constrained as follows: 

£ a , t (F 9 ) i)t r a)t = g q (4.5) 

where p g 's are the linear functions of joint probabilities Pr(a, 6)'s. 

We then call the matrix T a certificate if it satisfies (4.3) and (4.5) for a given set of joint 
probabilities of NS-box. The existence of the certificate will then be examined numerically 
by SDP. If the certificate does not exist, the joint probabilities cannot be reproduced by 
quantum mechanics. 

Examples on how to construct F q and g q for some specific NS-box protocols can be found 
in [18, 19]. For illustration, here we will explicitly demonstrate the case not considered 
in [18, 19], that is the k — 2, d — 3 RAC protocol. We will use the notation which we 
defined in the previous sections. We start by defining the set of operators £ = {Si} := 
I U {E Ax : A x e {0, 1}, x e {0, 1, 2}} U {E By : By e {0,1}, y e {0, 1}} with the operator 
label i G {0, 1, 2, m a , m a + m b }. The operator £ i=0 is the identity operator I, and 

The associated quantum constraints can be understood as the relations between joint 
probability Pr(a, 6) and Tr(£l£ b p) (or marginal probability Pr(a) and Tr(I£ a p)). That is, 

Tr(p) = 1, Tr(IE Ax p) = Pr(A x \x), Tr(IE By p) = Pv(B y \y), 
Tr(E Ax E A , x p) = 5 Ax , K Vr{A x \x), Tr(E By E B ,p) = 5 By , B , Pr(S J/ |y), 
Tr(E Ax E By p) = Pr(A x ,B y \x,y). (4.6) 

Note that these equations also hold when permuting the operators as Tr(E Ax E By p) = 
Tr(E By E AxP ). 

Moreover, we can make the matrix T to be real and symmetric by redefining it as T = 
(r* + T)/2. Thus, in the following we will only display the upper triangular part of T. We 
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then use the quantum constraints (4.6) to construct F q and g q by comparing them with 
(4.5). We then see that every constraint in (4.6) yields a matrix function F q which has only 
one non-zero element, and also yields a function g q which is either zero or contains only a 
single term of a marginal or joint probability. These constraints can be further divided into 
four subsets labeled by q = (q±, q^ qs, q^j as follows: 

1. The labels gi,g 2 G {0, l,...,m a + mj,} are used to specify the marginal probabili- 
ties Tr(I£ qi p) and Tr(£^ 2 £ q2 p). The corresponding matrix functions F q are given by 
(F qi ) s , t = S s ,iS t , qi +i and {F q2 ) s>t = 5 S:q2+1 5 tm+1 , and the g qi and g q2 are the correspond- 
ing marginal probabilities. 

2. The label g 3 G {1, ...,d k ~ 1 + k} is used to specify the probabilities associated with the 
orthogonal operator pairs, Tr(^ 2g3 -i^2g 3 p)- The matrix element (F q3 ) st = S s ,2q 3 8t,2q 3 +i, 
and g q3 = 0. 

3. The label g 4 e {1, ...,m a m b } = A(2x + A x ) + (2y + B y + 1) is used to specify the 
joint probabilities of the NS-box. The corresponding F q and g q are given by (F q4 ) S)t = 
5s,2x+A x +2St, ma +2y+B y +2, and g q4 = Pr (A x ,B y \x,y). 

Considering the above set of quantum constraint, we can define the associated T matrix 



Pr(0|0) A 

Pr(l|0) A 



Pr(0|l) A Pr(l|l) A 


Pr(0|2) A 


Pr(l|2) A 


Pr(0|0) B 


Pr(l|0) B 


Pr(0|l) B 


Pr(l|l)s \ 


XO Xi 
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X3 
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Pr(00|01) 
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Pr(10|00) 
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Pr(00|10) 


Pr(01|10) 
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Pr(01|ll) 
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Pr(10|10) 
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Pr(10|ll) 
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Pr(00|20) 


Pr(01|20) 


Pr(00|21) 


Pr(01|21) 
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Pr(10|20) 


Pr(ll|20) 


Pr(10|21) 


Pr(ll|21) 








Pr(0|0) s 





X12 


X13 










Pr(l|0) B 


X14 


XlS 












Pr(0|l) s 


















Pr(l|l)s / 



V 

(4.7) 

, where Pt(A x \x)as and PT(B y \y)B y s are the marginal probabilities for Alice and Bob, re- 
spectively, and Pr(A x , B y \x, y)'s are the joint probability of the NS-box. The elements Xi's 
in the above cannot be defined by the given marginal and joint probabilities because they 
corresponds to the probability of different measurement settings for only one party. Thus, 
they cannot appear in the constraints (4.5) but are still constrained by the no n- negative 
definiteness of P. 
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Testing the existence of the certificate — The task of testing the existence of the certificate 
can be formulated as a SDP by defining the standard primal and the associated dual prob- 
lems. The details can be found in Appendix C. The primal problem of SDP is subjected to 
certain conditions associated with a positive semidefinite matrix, which can be either linear 
equalities or inequalities. Each primal problem has an equivalent dual problem. Therefore, 
when the optimal value of the primal problem is the same as the optimal value of the dual 
problem, the feasible solution of the problem is obtained. 

For our case the primal problem of SDP is as follows: 

maximize A (4.8a) 
subject to Ti(F^T) = g q , q = l,...,m, (4.8b) 

r - ai y o. (4.8c) 

Obviously, if the maximal value A > is obtained, the non-negative definiteness of T is 
guaranteed under the quantum constraints (4.3). 

On the other hand, the associated dual problem is given by 

maximize UgSq, (4.9a) 

Q 

subject to ^2y q Fjy0, (4.9b) 

5> 9 Tr(if ) = 1. (4.9c) 

q 

Note that the quantity J2 q y q g q is the object in characterized the quantum non-locality since 
gg's are mainly the two-point correlation function. Therefore, if maximizing this quantity is 
equivalent to finding the generalized Tsireslon bound. Therefore, if the solution of this SDP 
is feasible, then the associated certificate exists and there yields the generalized Tsireslon 
bound. 



2. Hierarchy of the semidefinite programming 

Different operator sets (7s yield different quantum constrains (4.1) and (4.3). It seems no 
guideline in choosing the set O and examining the existence of the corresponding certificate. 
However, it is easy to see that the certificates associated with different operator sets are 
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equivalent. This can be seen as follows. Let us assume O and Cf are two linearly equivalent 
set of operators such that Ot G can be expressed by a linear combination of the elements 
in Cf, i.e., O; = ^2jCijO'j. If there exists a matrix V satisfying (4.3) and(4.5) for the 
corresponding operator set Cf , then there will exist another matrix T whose elements are 
r S;t = J2 q l C* s V' q l Ci :t also satisfying (4.3) and (4.5) for the set O. Therefore, we only need 
to stick to one set of operators in this linear equivalence class when examining the existence 
of the corresponding certificate. 

Besides, a systematic way of constructing O is proposed in [18, 19] so that the task 
of finding the certificate can be formulated as a hierarchy of SDP. This is constructed as 
follows. The length of the operator Oi, denoted by \Oi\, is defined as the minimal number of 
projectors used to construct it. We can then divide the set O into different subsets labeled 
by the maximal length of the operators in the corresponding subset. Thus, we decompose 
the operator set O into a sequence of hierarchical operator sets denoted by S n where n is 
the maximal length of the operators in S n . That is, 

50 = {1} 

51 = {S } U{E a :aeA}U{E b :beB} 

5 2 = {So} U {Si} U {E a E a , : a, a' G A} U {E b E v :b,b'eB}U 
{E a E b :aeA,beB} 

(4.10) 

Furthermore, to save the computer memory space used in the numerical SDP algorithm, 
in the above sequence we can add an intermediate set between S n and S n+ i, which is given by 
Sn+AB '■= {S n } U {S G S n+ i\S = E a E b S' : a G A, b G B}. For example, when n = 1 we have 
Si+ab = {S 1 }U{E a E b : a G A,b G B} such that Si C S 1+AB C S 2 . Note that S 1+A b doesn't 
have the product of the marginal projection operators in the form of {E a E a / : a, a' G A} and 
{E b Ey : b,b' G B}. It is clear that Si + ab Q S 2 . All the operators in O can be expressed in 
terms of the linear combination of the operators in S n for large enough n. 

Since we know S n C S n+ AB Q S n +i, the associated constraints produced by S n+ i is 
stronger than S h+ ab and S n . We can start the task from Si then S\ + ab, S 2 and so on. Let 
the certificate matrix associated with the set S n be denoted as T^ n \ Finding the certificate 
associated with this sequence can be formulated as a hierarchical SDP. Once the given joint 
probabilities satisfy the quantum constraints (4.3) so that the associated certificate 
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FIG. 3. The geometric interpretation of collection Q n 

exists, we then denote the collection of these joint probabilities as Q n . Since we know that the 
associated constraints are stronger than the previous steps of the hierarchical sequence, the 
collection Q n will become smaller for the higher n. That is, the non-quantum correlations will 
definitely fail the test at some step in the hierarchical SDP. The geometrical interpretation 
of the above fact is depicted in Fig 3. 

It was shown in [18, 19] that the probability is ensured to be quantum only when the 
certificate associated with SVi-s-oo exists, i.e., for the joint probabilities in the collection Q of 
Fig 3. In this sense, it seems that we have to check infinite steps. To cure this, a stopping 
criterion is proposed in [18, 19] to terminate the check process at some step of the hierarchical 
SDP. This can ensure that the given joint probability is quantum at finite n if it is. 

The stopping criterion is satisfied when the rank of sub matrix of r' n ) is equal to the 
rank of T^ n \ i.e., 

rank(T { x] Y ) =rank(T i ~ n) ). (4.11) 

The element of T^y is constructed by the operators in the set Sx,y '■= {S n -i} U{S = 
E a E b S' :aeA x ,beB Y ,\S\< n}. 

The above stopping criterion is for integer n. However, it was also generalized in [19] 
for the intermediate certificate r( n+AB ): the stopping criterion is satisfied if the following 
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equation is satisfied for all the measurement settings X and Y, 



rank(T^ n+XY ^ = rank(T {n+AB ^), 



(4.12) 



so that the certificate Y^ n+AB ^ has a rank loop. Here Y^ n+XY ^ is the certificate associated 
with S n+XY := {S n } U {S E S n+1 \S = E a E b S' : aeA x ,be B Y }. 

Now we are ready to implement the above criterion to numerically examine the quantum 
behaviors of the given joint probabilities for our RAC protocols with higher k and d. 

3. The bound of Bell-type inequality and the corresponding mutual information 
in the hierarchical semidefinite programming 

Any Bell- type inequality including (1.5) can be written as the linear combination of joint 
probabilities, then the hierarchical SDP can be used to approach the quantum bound of Bell- 
type inequality (the Tsirelson bound). Recall that the quantum non-locality and the mutual 
information / are monotonically related for symmetric and isotropic channels with i.i.d. and 
uniform input marginal probabilities. After obtaining the maximum of Bell-type function at 
each step of the aforementioned hierarchical SDP, we can calculate the corresponding mutual 
information I and compare with the information causality. Since the quantum constraint 
is stronger in the hierarchical SDP and the collection of Q n will become smaller while n is 
increasing. We then know that the bound of Bell-type inequality and the associated mutual 
information / will become tighter for larger n and it will converge to the quantum bound for 
large enough n. Once the bound of mutual information / at some step of hierarchy doesn't 
saturate the information causality, we can then infer that the quantum bound of mutual 
information will not saturate the information causality, too. 

First, let us discuss how to find the bound of Bell-type inequality. As discussed before, the 
problem of finding the Tsirelson bound can be reformulated as a SDP. The primal problem 
of this SDP is defined as 



maximize Tr(C T r (n) ) 
subject to Tr(ifr (n) ) = g q (p), q = 1, • • • , m; 

r(«) y o. 



(4.13a) 
(4.13b) 
(4.13c) 
(4.13d) 



Tr^r^) > 0, w = l,---,s; 
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The matrix C is given to make Tr(C T T (n )) the Bell- type function which we would like to 
maximize. Eq. (4.13b) and (4.13c) are the quantum constraints discussed in the previous 
subsections so that the quantum behaviors are ensured during the SDP procedure. Moreover, 
with proper choice of the matrix H w 5 , the condition (4.13d) is introduced to ensure the 
non-negativity of the joint probabilities which are the off-diagonal elements of I^ 1 ). 

In the following we define the matrix C for our case. Eq. (1.5), which can be expressed 
as the linear combination of the joint probabilities, i.e., ^2g$P*(Bg — Ag = x • y\x,y), is 
the object for our SDP (4.13). Since we only consider d — 1 marginal probabilities per 
measurement setting, we should further rewrite our object according to the completeness 
conditions, i.e., Y* ae xE a = I and Eft g y£^ = I. After rewriting, we can write down the matrix 
C in (4.13). We take d = 3, k = 2 RACs protocol for example. For r«, 



/ 



C = 



1 



1. 

2. 



2. 0. 0. -1. 1. 



\ 



0. 0. 0. 0. 0. 



0. 0. 0. 0. 
0. 0. 0. 0. 



-1. 0. 0. 
1. 0. 0. 



0. 
0. 



0. 0. 
0. 

0. 0. 

0. 0. 



1. 

0. 
3. 
0. 



0. 0. 0. 0. 

-1. 1. -I. 1. 



0. 



V 



(4.14) 



1. 0. 3. 0. 0. 
0. -1. -2. -1. -2. 
0. 1. -1. 1. -1. 
0. 0. -1. -2. 2. 1. 
0. 1. -1. 1. 2. 
0. -1. -2. -1. 1. 
0. 1. -1. -2. -1. 
-1. 1. 0. 0. 0. 0. 
-2. -1. -2. -1. -2. -1. 0. 0. 0. 0. 
-I. 1. 2. 1. -1. -2. 0. 0. 0. 0. 
0. -2. -1. 1. 2. 1. -1. 0. 0. 0. 0. 
The size of (4.14) is equal to the size of (the first step in our hierarchical SDP). If n ^ 1, 
the size of matrix C will be bigger, we could define (4.14) as the sub-matrix of matrix C 
and the other elements of C are zero such that the object functions Ti(C T Y^) are all equal 
for different steps of our hierarchical SDP. 

For higher d and k, we write down the quantum constraints (4.3) for and r( 1+AB ) 
and estimate its number in Appendix D. However, due to the limitation of the computer 
memory (we have 128GB), we cannot finish all the tests of our hierarchical SDP but stop at 
5 Since we only consider a £ A and b e B to save the computer memory space, we need to choose H w to 
ensure the non-negative definiteness of not only the (d — l) 2 terms of but also the other d 2 — (d— l) 2 
terms which are the linear combinations of the elements of rW. 



/ 



25 



level of 1 + AB. In our calculation, we take the J2xy^ T (^y — Ag = x ■ y\x,y) as the object 
of SDP, which is monotonically related to the object of communication complexity Yly^y 111 
a straightforward way via (2.10). 

At the n = 1 level the numerical results of our SDP object ^~Pr(B y — Ag = x ■ y\x, y) 
for various k and d are listed below: 



k 


d=2 


d=3 


d=4 


d=5 


2 


3.4142 


4.8284 


6.2426 


7.6569 


3 


9.4641 


19.3923 


32.7846 


49.6410 


4 


24.0000 


72.0000 


160.0000 




5 


57.8885 


255.7477 







The entries are in the table are the values of J2g y^ r (^y ~ Ag = x ■ y\x, y). 

Similarly, at the n = 1 + AB level the results for the same SDP object are listed below: 



k 


d=2 


d=3 


d=4 


d=5 


2 


3.4142 


4.6667 


5.9530 


7.1789 


3 


9.4641 


18.6633 






4 


24.0000 








5 


57.8885 









The stopping criterion is checked at the same time. Unfortunately, it is not satisfied 
for r( 1+AB \ this means that the bound associated with r 1+AB is not the Tsirelson bound. 
However, our numerical computational capacity cannot afford the higher level calculations. 

Few more remarks are in order: (i) Even we do not require our channel to be isotropic, 
i.e., uniform ^ for our SDP, we find that the channel for maximizing the SDP object to be 
isotropic for our level n — 1 and n — 1 + AB check, (ii) We find the bound at the n — 1 level 
is the same as the bound derived from signal decay theorem in section II. (iii) For d = 2 
case, the bound for the SDP object at the n — 1 and n — 1 + AB level are equal, which is 
also the same as the Tsirelson bound as can be proved by Tsirelson's theorem [17]. Since 
the bound is the Tsirelson bound, it will not change for the further steps of the hierarchical 
SDP. (iv) For d > 2, the bound of the SDP object at the n — 1 + AB level becomes tighter 
than the one at the n — 1 level, as expected. However, it needs more numerical efforts 
to arrive the true tight bound for the quantum non-locality, i.e., the generalized Tsireslon 
bound. 
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Since the optimal channel for the above SDP procedure is isotropic, we can then obtain 
the value of the noise parameter £ and use (3.14) to evaluate the corresponding mutual 
information /: 

At the n = 1 level, 





d=2 


d=3 


d=4 


d=5 


Information causality 


1.0000 


1.5850 


2.0000 


2.3220 


k=2 


0.7982 


1.3547 


1.7845 


2.1357 


k=3 


0.7680 


1.3360 


1.7895 


2.1680 


k=4 


0.7549 


1.3333 


1.8048 




k=5 


0.7476 


1.3345 







The entries are the corresponding mutual information / given by (3.14). 
At the n = 1 + AB level, 





d=2 


d=3 


d=4 


d=5 


Information causality 


1.0000 


1.5850 


2.0000 


2.3220 


k=2 


0.7982 


1.1972 


1.5478 


1.7788 


k=3 


0.7680 


1.1531 
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Note that our results support the information causality. This is because the maximal mu- 
tual information I evaluated from the joint probabilities constrained by the n — 1 certificates 
is already smaller than the bound from the information causality. Thus, as implied by the 
geometric picture of Fig. 3, the the quantum bound on the mutual information / obtained 
in the large n limit will also satisfy the information causality, at least for the symmetric 
and isotropic channels with i.i.d. and uniform input marginal probabilities. Moreover, for 
a given d the maximal mutual information I from the certificates decrease as k increases. 
However, it is hard to find the quantum bound of the mutual information I exactly because 
the stopping criterion fails at the n — 1 + AB level. It needs more checks for higher n 
certificate to arrive the quantum bound of the mutual information I. However, we will not 
carry out this task due to the limitation of the computational power. 
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V. MAXIMIZING MUTUAL INFORMATION FOR GENERAL QUANTUM COM- 
MUNICATION CHANNELS 



Most of the RACs protocols discussed so far and in the literatures are under some as- 
sumptions such as i.i.d., uniform input marginal probabilities for the symmetric and isotropic 
channels. If we want to test the information causality, we should maximize the mutual in- 
formation / for the more general quantum communication channels. 

Recall that from the proof of section III, we cannot formulate the problem of maximizing 
the mutual information / over the joint and the input marginal probabilities of the NS-box 
as a convex optimization programming. Thus, for more general quantum communication 
channels, we are forced to solve the problem by brutal force. The procedure is as follows. 
Firstly, we divide the defining domains of the joint and input marginal probabilities into 
many fine points. We then pick up the points satisfying the consistent relations for a 
given communication channel (protocol). Secondly, we test if these joint probabilities can be 
reproduced by quantum mechanics or not. If they can, we then evaluate the corresponding 
mutual information /. Thirdly, by comparing these mutual information /, we can obtain the 
maximal one and then check if the information causality is satisfied or not. By this brutal 
force method, we can then obtain the distribution of mutual information I over the joint 
and the input marginal probabilities produced by quantum mechanics. This yields far more 
than just the maximal mutual information consistent with quantum mechanics. The price 
to pay is the cost for the longer computing time. Due to the restriction of the computer 
power, we can only work for d = 2 and k = 2 case. 

We start the discussion of more general communication channels by fixing either the joint 
probabilities Pr(B y — A x \x,y) or the input marginal probabilities Pr (<2j). Firstly, we assume 
the input probabilities are i.i.d. and uniform such that we could take the CHSH function 
as the object of quantum communication complexity. Therefore we could study the rela- 
tion between the mutual information and the quantum communication complexity. Note 
that, when requiring our communication channel (3.3) to have the i.i.d. and uniform input 
marginal probabilities, the channel between a« and j3 then becomes symmetric automati- 
cally. Secondly, in order to study the influence of the input marginal probabilities Pr(ctj) 
on the mutual information, we pick up three sets of the joint probabilities Pr(B y — A x \x, y) 
constrained by quantum mechanics and then evaluate the corresponding mutual information 
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with different input marginal probabilities Pr(aj). Besides these communication channels, 
in order to test if the information causality is always satisfied, we will consider the most 
general quantum communication channel, namely, we do not impose any condition on the 
communication channel except the quantum constraints for joint probabilities. 

Before evaluating the corresponding mutual information, the chosen joint probabilities 
Pr(B y — A x \x,y) should pass a test. For d = 2 and k = 2 RAC protocol, the quantum 
constraint is as follows: 



G = 
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where C XjV := (—l) xy [2 Yi{B y — A x = xy\x, y) — 1] is the correlation function of the measure- 
ment setting x, y for Alice and Bob, respectively. The condition was pointed out in [14, 17- 
19] and can be derived as the necessary and sufficient condition for the quantum correlation 
functions C XtV (or equivalently the joint probabilities Pr(B y — A x \x,y)) by Tsirelson's theo- 
rem [15]. Actually, G is the sub matrix of the n — 1 certificate T^. Due to the positivity, 
(5.1) is satisfied once >z 0. 

Since the condition (5.1) is related to a positive semidefinite matrix, we need to use the 
numerical recipe to solve it. Once the joint probabilities are not fixed in the communication 
channel (protocol), we have to pick up many sets of joint probabilities from their defining 
domains. This seems not efficient enough to test all possible sets of joint probabilities by 
SDP. Therefore, instead of condition (5.1) we use a set of linear inequalities to test if the 
joint probabilities can be produced by quantum mechanics or not. In this way, the test will 
become simpler and more efficient. The linear inequalities are [13, 16] 

\arcsin(C 00 ) + arcsin(C i) + arcsin(C w ) — arcsin{Cu)\ < n, (5.2a) 

\arcsin(Coo) + arcsin{Cm) — arcsin(Cio) + arcsin(Cu)\ < rr, (5.2b) 

\arcsin(Coo) — arcsin(Coi) + arcsin(Cio) + arcsin{Cu)\ < it, (5.2c) 

I — arcsin(C 00 ) + arcsin(C i) + arcsin{C w ) + arcsin(Cu)\ < n. (5. 2d) 

Actually, the condition (5.2) is equivalent to (5.1). If the linear inequalities (5.2) are satisfied, 
we then can find valid 9 1 and 9 2 to make condition (5.1) satisfied, and vise versa [14, 18, 19]. 
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Once the corresponding correlation functions C Xty satisfy (5.2), we will know that these 
joint probabilities Pr(B y — A x \x,y) can be reproduced by quantum system. But we have to 
notice that some of them could also be expressed by the local hidden variable model. This 
means the shared correlation is local. Since the bound of communication complexity for local 
correlations is different from the quantum non-local ones, we could use the communication 
complexity to divide them. 

1. Symmetric channels with i.i.d. and uniform input marginal probabilities 

We start with the most simple case: the d = 2, k = 2 RAC protocol with the sym- 
metric channels and i.i.d., uniform input marginal probabilities. In this case, the successful 
probability for Bob to guess Alice's database right is equivalent to the CHSH function i.e., 
|C 0j o + Co,i + Ci,o — Ci,i|- Thus, we could take the CHSH function as the object of commu- 
nication complexity. Moreover, using the CHSH function and its three symmetric partners 
by shifting the minus sign, we could ensure that the shared correlations can be described 
by the local hidden variable model. Once the corresponding values of all these functions 
are less than 2, the shared correlation is local. Otherwise, the shared correlation could be 
quantum non-local or beyond. The latter happens when some of these values are larger than 
2^/2 which is nothing but the Tsirelson bound. When the Tsirelson bound is reached, the 
quantum non-locality between two partite (Alice and Bob) is the maximum. 

In our numerical calculations, we divide the defining domain of the joint probabilities 
~Px(B y — A x \x, y) into 100 points. Follow the procedure of our brutal force method, we obtain 
the distribution of the mutual information I over the quantum communication complexity as 
shown in Fig 4. for symmetric channels with i.i.d. and uniform input marginal probabilities. 
Note that, the quantum communication complexity here (x-axis of Fig 4) is characterized 
by the value of the CHSH function, |C ,o + Co,i + Ci,o — Ci,i | • In Fig 4, all the points satisfy 
quantum constraint (5.2). We particularly use the red color to denote the points which 
also can be obtained by the local correlations, i.e., the bound of CHSH inequality and its 
three symmetric partners are all less than 2. Moreover, it seems that the distribution of the 
mutual information over the quantum communication complexity as shown in Fig 4 is not 
continuous. This is not the case but because we did not partition the defining domain of 
the joint probability fine enough. In Fig 5 we partition more finely on the defining domain 
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0.5 1 1.5 2 2.5 3 

communication complexity 

FIG. 4. Mutual information v.s. (quantum) communication complexity for d = 2, k = 2 RAC 
protocol with i.i.d. and uniform input marginal probabilities. Here, the (quantum) communication 
complexity is characterized by the CHSH function. The red part can be achieved also by sharing 
the local correlation. 

of the joint probability in the top region of Fig 4 and show that the empty region in Fig 4 is 
now filled. Similarly, the empty region on the top of Fig 5 could be filled again by the more 
fine partitioning. 

The results in Fig 4 is consistent with the information causality since the maximal mutual 
information for the local or quantum correlations is bound by 1, the bound suggested by in- 
formation causality. However, the peculiar part of Fig 4 is that some of the local correlations 
can achieve the larger mutual information than / ~ 0.8, which is achieved by the correlations 
with the maximal quantum communication complexity. This peculiar part is the red region 
above I ~ 0.8 in Fig 4. Especially, the maximal mutual information I — 1 is reached when 
the shared correlation is marginally non-local, i.e., the communication complexity is equal to 
2. This indicates that the mutual information is not monotonically related to the quantum 
communication complexity. Or put this in the other way, the more quantum non-locality 
may not always yield the more mutual information. We think it is interesting to understand 
this phenomenon in the future works. 

Form these symmetric quantum channels with i.i.d. and uniform input marginal proba- 
bilities, we pick up the isotropic ones (£ = £i) and obtain Fig 6. It shows that the mutual 
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FIG. 5. Some points near the top region in Fig. 4. 
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FIG. 6. Mutual information vs (quantum) communication complexity for isotropic channels with 
i.i.d. and uniform input marginal probabilities. 



information I and the quantum communication complexity are monotonically related. This 
explicitly demonstrate what we have discussed in the previous section. 
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2. Channels with non-uniform input marginal probabilities 

In the above communication channels, the input marginal probabilities are fixed to be 
i.i.d. and uniform. Now we would like to demonstrate the effect of non-uniform input 
marginal probabilities. In this case, we would like to vary the input marginal but keep the 
joint probabilities fixed. To see this effect for different channels, we consider three different 
sets of the joint probabilities corresponding to (i) symmetric, (ii) symmetric and isotropic 
and (iii) asymmetric channel. 

To be more specific, for the case (i) the joint probabilities should be constrained by 
Pr(B y — A x = 0\x,y = 0) = 1 and Pr(B y — A x = xy\x,y — 1) = \ for x — 0, 1 such 
that the noise parameters are given by £o — 1 an d £i = 0. For the case (ii) all the joint 
probabilities Yi{B y — A x — xy\x,y) are equal to |(1 + ^) such that £o — £i — For 
the case (iii) the joint probabilities are given by Pr(B y — A x = 0\x — 0,y) — |(1 + ^) and 
Pr(B y — A x = xy\x — l,y) — \ for y — 0, 1. Obviously, it is asymmetric for general input 
marginal probabilities. 

In the following discussion, we denote the mutual information I(ao;/3\b = 0) as I® and 
I(ai, /3\b — 1) as Ii, which are functions of two input marginal probabilities, namely, Pr(ao = 
0) and Pr(ai = 0). Here can be thought as the mutual information for the sub-channel 
between and j3, and the corresponding noise parameter is £j. The mutual information I 
is just I = I + I\. Note that, I does not depend on Pr(B y — A x = xy\x,y = 1) and Ii 
not on Pr(B y — A x = 0\x,y = 0). Thus, the sub-channel for I can be made symmetric 
by just requiring Pr(B y — A x = xy\x,y = 0)'s for x — 0, 1 are equal, and similarly for the 
sub-channel for I\ to be symmetric. An important feature for these symmetric channels is 
that U will depend only on Pr(aj) not on Pr(a( i+1 mod 2 )). 

For case (i), both sub-channels are symmetric. Moreover, since £ — 1 and £i = so 
that the sub-channel for ^ is noiseless and the one for £i is completely noisy. This then 
leads to I\ — and I = I . The dependence of / = 1$ on the input marginal probability, 
i.e., Pr(ao = 0) only, is shown in Fig 7-8. Note that / reaches its maximal value, 1 at 
Pr(ao = 0) = \ as expected for the noiseless symmetric channel. This point is nothing but 
the point of maximal / in Fig 4. Note that this maximum saturates the bound by information 
causality. This implies that we can reach the causally allowed mutual information bound by 
sacrificing one of the sub-channel without any comprise. This is a bit surprising. 
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FIG. 7. I = I vs Pr(o ,i = 0) for case (i). 



FIG. 8. Density plot of the Left figure. 



For case (ii), the channel is both symmetric and isotropic, we then expect that the isotropy 
will also appear in the plot for / vs the input marginal probabilities, and that Iq and I\ 
will have the same shape. This is indeed the case as shown in Fig 9-12. Note that I, only 
depends on Pr(oj) though I = J + J x depends on both. We see that the maximal value of / 
occurs at the symmetric point, i.e., all the Pr(aj) equal to |. However, the maximal value 
is 0.7983 which is less than 1 of the information causality but is the same value for the case 
of maximal quantum communication complexity. 

Finally, for case (iii), i.e., the particular asymmetric channel, ij's are now dependent on 
both Pr(eij)'s unlike in the previous two cases. However, the mutual information I has the 
isotropic form as in the case (ii) but with a far smaller maximal value at the symmetric 
point. The results are shown in Fig 13-16. 

Our above results implies that the the closer ~Pi(B y — A x = xy\x,y) to 1, the larger of 
the mutual information /. This is consistent with our RAC protocol as Bob can perfectly 
guess Alice's inputs by using the PR box [3]. Of course, the information causality ensures 
that the NS-box constrained by quantum mechanics can not be the PR box. Also, note that 
the maximum of I occurs at the symmetric point of the input marginal probabilities for 
case (ii) and (iii) but it is not the case for case (i). Therefore, the uniform input marginal 
probabilities do not always lead to the maximal /. 
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FIG. 9. I vs Pr(ao,i = 0) for case (ii). FIG. 10. h vs Pr(o ,i = 0) for case (ii). 




FIG. 11. / vs Pr(a ,i = 0) for case (ii). FIG. 12. Density plot of the Left figure. 

3. Information causality for the most general channels 

After testing the information causality for the more general channels as discussed in 
the previous sections, we would wonder if the information causality holds for the most 
general channels or not, i.e., the channels without any additional constraint on the joint 
probabilities and the input marginal probabilities except the necessary quantum and no- 
signaling constraints. For our d = 2, k = 2 RAC protocol, we check this by partitioning the 
defining domains of the probabilities into 100 points and then using the brutal force methods 
to do the numerical check. We find that the information causality is always satisfied. This 
yields a more general support for the information causality. 

Furthermore, we find that the information causality is saturated, i.e., 1 = 1 when one 
of the sub-channel is noiseless and the other one is completely noisy. This is similar to the 
case (i) discussed in the previous subsection. 
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FIG. 13. I vs Pr(oo,i = 0) for case (iii). FIG. 14. h vs Pr(a ,i = 0) for case (iii). 




FIG. 15. I vs Pr(ao,i = 0) for case (iii). FIG. 16. Density plot of the Left figure. 

VI. CONCLUSION 

Information causality was proposed as a new physical principle and gives an intuitive 
picture on the meaning of causality from the information point of view. Therefore, to test 
its validity for different settings will help to establish it as a physical principle. Motivated by 
this, in this work we try our best to extend the framework of the original proposal to the more 
general communication protocols, such as the multi-level and multi-setting or removing the 
conditions of symmetric channel or uniform input marginal probabilities. We then test the 
information causality for these general protocols by either adopting the SDP for numerical 
check, or using the brutal force method for the more general communication channels. With 
all these efforts, our results are rewarding: we see that the information causality are preserved 
in all the protocols discussed in this work. This reinforce the validity of the information 
causality further than before. Though more checks for more general protocols should be 
always welcome. We also find that the information causality is saturated not by sharing the 
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correlations with the maximal quantum non-locality, but by the ones which are marginally 
non-local. This then raises the issues on the intimate relation between the shared mutual 
information and the quantum non-locality. Especially, this result challenges our intuition 
that a channel can transfer more information with more quantum non-local resources. We 
think our findings in this paper will shed some light on the related topics. 
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Appendix A: Signal decay and data processing inequality for multi-nary channels 

In this appendix, we will first sketch the key steps of [10] in obtaining the maximal bound 
on the signal decay for the binary noisy channels, and then generalize this derivation to the 
one for the multi-nary channels. 

Our setup is to consider a cascade of two communication channels: X — > Y — > Z. The 
decay of the signal is implied by the data processing inequality, i.e., 



The mutual information I(X;Y) = H{Y) - £\ Pr(X = i)H{Y\X = i), where H{Y) and 
H(Y\X) are the Shannon entropies for the probability Pr(Y) and the conditional probability 
Pt(Y\X), respectively. 

Furthermore, for the binary symmetric channel A characterized by 



it was shown in [10] that the bound on the signal decay is characterized by the following 
bound 



I{X-Z)<I{X-Y). 



(Al) 




(A2) 




(A3) 
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In this appendix, we will generalize the above result to the one for the dinary channel 
characterized by Pr(Z — i\Y — i) — £ and Pr(Z = s ^ i\Y = i) = with i G {0, 1, d — 
1}, so that the signal decay is bound by 

1. Sketch of the proof in [10] 

The derivation in [10] consists of two key steps. The first one is to show the following 
theorem for weak signal: 

Theorem I: The ratio ^x'-p) reac ^ es its maximum if the conditional probabilities Pr(Y\X = 

0) and Pr(F|X = 1) are almost indistinguishable, i.e., \ Pr(Y = 0\X = 0) — Pr(F = 0\X = 

1) 1 ^0. 

To prove this theorem we need the following lemma: 

Lemma I: For any strictly concave function f and g on the interval [0, 1], and any p G 
[0, 1], the ratio 

r(x,y)=g 2 (x,y,p)/f 2 (x,y,p) (A5) 

reaches its maximum in the limit \x — y\ — > 0. Here f 2 {x,y,p) = f(px + (1 — p)y) —pf{x) — 
(1 — p)f(y) denotes the second order difference of the function f with the weight p, and 
similarly for the g 2 (x, y,p). 

We sketch the proof of this lemma, which will be useful when generalizing to the multi- 
nary channel. We assume that the ratio r reaches its maximum at x = x* and y = y*, 
and for concreteness assuming x* < y*. Note that < r < oo due to the concave / 
and g. We can perform afline transformation to scale this maximal value of r(x*,y*,p) 
to be 1, and also to make f(x*) = g(x*) and f(y*) = g(y*)- This immediately leads to 
f{px* + (1 —p)y*) = g(px* + (1 —p)y*). That is, there is a point z* = px* + (1 —p)y* inside 
the interval at which / also equals to g. Use this fact, it is easy to convince oneself 

that either r(z*,y*) > r(x*,y*) or r(x*,z*) > r(x*,y*). For more subtle details, please see 
[10]. By repeating this procedure we prove the lemma. 

Observe that I(X; Y) and /(X; Z) are the second order difference of the (concave) entropy 
functions H(Y) and H(Z), respectively with the weight p = Pr(X = 0). We can then prove 
the Theorem I by the above lemma. 
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The second step is first to rewrite the ratio yj^fy in terms of relative entropy D(p\\q) := 
£ x Pr(p = s)^£g,thatis, 

HX; Z) = Eto Pr(X = i)D(Pi(Y\X = i) ■ A\\ Pr(F) ■ A) 
I{X-Y) Y.U?<X = i)D{¥r{Y\X = i)\\¥T{Y)) 

Then, based on the above theorem we can parameterize the conditional probability Pr(F|X = 
0) = p + e where p = Ei=o P r (-^ — i) Pr(Y\X = i) and e = (e, — e) with e being sufficiently 
small. With this condition, (A6) can be simplified to 

I(X;Z) _ D((p + e)-A\\p.A) 

I(X;Y)~ D(p + e\\p) ' 1 ' 

Note that the ratio now does not depend on Pr(X). 

Finally, given the binary channel (A2) we can expand the relative entropy in terms of 
ej Pr(y), so for the ratio ^x'-y) • Then, fixing e and then varying the first order term of the 
ratio jp^.'y) in the above expansion over p, we obtain the bound in (A3). 



2. Generalizing to the multi-nary channels 

We now generalize the above derivation to the trinary noisy channels, then the general- 
ization to the dinary channel will just follows. The key steps are similar to the binary ones. 
The first step is to use the same method to prove the following theorem: 

Theorem II: The ratio ^x'-y) reac ^ es ^ s maximum only when all the three conditional 
probabilities Pr(F|X = i) with i — 0, 1, 2 are almost indistinguishable. 

The strategy to prove this theorem is to observe that we can treat the pair (Pr(Y = 
0|X = i),Pr(y = \\X = i)) for each i (note that Pr(F = 2\X = i) is not independent of 
this pair) as a point inside the unit square ([0, 1], [0, 1]). Then the three points Pr(F|X = i) 
for % — 0, 1, 2 form a triangle. We can then follow the same way of proving the Lemma I in the 
previous subsection for the trinary case. First, we assume the maximal value of r occurs at 
all three vertices of some triangle. We then perform the affine transformation to rescale this 
maximal value to 1, and to make / = g (or more specifically H(Y\X = i) = H(Z\X = i)) 
at the three vertices of the above triangle. This then immediately leads to that there exists 
some point inside the triangle such that / = g. We can use this point to construct a 
smaller triangle with any two of the vertices of the original triangle and show that the ratio 

39 



r for this new triangle is greater than the one for the original larger triangle. Repeating 
this procedure we can prove the above theorem. It is also clear that we can generalize the 
theorem for the multi-nary channels by generalizing the triangle to the concave body of the 
higher dimensional space. 

Here, we should point out that one can always reduce the concave body to the linear 
interval one, so that we can reduce to the situation for the binary case. That is, we set all 
the conditional probabilities except one to be equal, and then study the closeness condition 
of the remaining two distinct conditional probabilities for the maximal ratio of ^x'y] ■ m 
the following, we will always restrict to such a situation. 

We then go to the second step as for the binary channel, that is to use Theorem II 
to reduce the problem of maximizing ^x'-p) t° the one of maximizing the ratio of relative 
entropies. We rewrite the ratio of two mutual information as following, 

I(X;Z) = Y.L Pr(X=i)D(Pr(Y\X=i)-A\\Pr(YyA) , > R) 

I(X;Y) £i =0 Pr(X=i)D(Pr(Y\X=i)\\Pr(Y)) " °> 

To simplify the expression for further manipulations, we denote the average probability 
of Y as p = X^=oP r (^ = i)P*(Y\X = i), and parameterize the probability Pr(F|X = 
0) = p + e and Pi{Y\X = 1) = p + e x . Thus, the probability Pr(F|X = 2) is forced to be 
P~ Pr(v=2) ^° — Prpf=2) ^i- ^he parameter vectors e and e*i should be sufficiently small as 
required by Theorem II to have maximal ratio jr^'.y) • Furthermore, we will further reduce the 
triangle to the linear interval case by assuming e* = e*i, i.e., Pr(F|X = 0) = Pr(V|X = 1). 

The ratio (A8) then becomes 

I(X;Z) _ D((p + e- )-A\\p.A) 

I(X;Y)~ D(p + r \\p) ■ {A9} 

Note again the ratio now does not depend on Pr(X). 

Before serious expansion of (A9) in the power of e , we need to specify p = (p r (r=o),Pr(y=i),Pr(Y 
and e = (i> ,t>i,u 2 ). Note that, v o + V\ + v 2 = 0. As for the bi-nary channel, we expand the 
relative entropy in terms of Pr ^ = ^ ■ The leading term of the expansion for the denominator 
of (A9) is found to be 
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To find the expansion of the numerator, we need to specify the channel A between Y and 
Z. The generic trinary channel is given by 



^ d\ CL2 CL3 ^ 



A = Pt(Z\Y) 



\ 



(All) 



/ 



bi b 2 b 3 

c\ c 2 c 3 

where the elements of the channel should satisfy a\ + a 2 + a 3 = 1, bi + b 2 + b 3 = 1, and 
Ci + c 2 + C3 = 1. Then, the leading term in the expansion of the numerator of (A9) is found 
to be 



D( J- \ A \\ -ft A\ — 1 / voai+vibi+V2Ci , voa,2+vi b-2+V2C2 _i_ 
^{{P + e 0j ' ^ || P ' A) - 2te2~l ^Z=0) 1 ^Z=Tj h 



t)oa 3 +t)ib3+t)2C3 



p(Z=l) 



p(Z=2) 



). (A12) 



For simplicity, we only consider the symmetry trinary channel as follows 



A = Pr(Z\Y) 



2 



2 2 

<» 2 



\ 2 2 



(A13) 



Then, (A12) then becomes 



D((p + io) ■ A || p- A) = 



■3£ 



1 1 2 

i) 2 — y 

; 07n.9 Z^i 



(A14) 



2 y 2ln2^Pi(Z = i)' 

Since we know that for symmetric channel, the maximal mutual information is achieved 
for uniform input probabilities. Thus, we assume uniform Pr(Y) and Pr(Z) so that (A9) 
depends only on variable £. We then obtain 



HX;Z) _3£ 



(A15) 



J(X;Y)- V 2 

This is the generalization of (A3) for binary channel to the trinary one. 

Similarly, we can generalize the above derivation to the dinary channels. If the channel 
between Y and Z is a dinary and symmetry channel specified as follows: Pr(Z — i\Y — i) — £ 
and ~Pi{Z = s ^ i\Y = i) = with 2 G {0, 1, d — 1}, then the bound of the ratio ^x'-y] 
is given by (A4). 



Appendix B: The concavity of mutual information 

In this appendix, we want to prove the mutual information / is not a concave function to 
joint probabilities Pi{Bg — Ag\x, y) and input marginal probabilities Pr(eij). Thus, we could 

41 



not formulate the problem (maximizing mutual information I) as a convex optimization 
programming. 

First, we reexpress mutual information / by Pr(Bg — A$\x,y) and Pr(aj). If maximizing 
mutual information is a concave function to these probabilities, the second order partial 
derivative of mutual information respecting to each probability should be negative. Here, 
we find a violation when calculating q^v{b~a~=q\x=q y=o)) 2 • ^ n f°U° wm g paragraphs, we 
denote the joint probability Pr(i% — A$ = 0\x = 0,y = 0) as V. 

The mutual information can be rewritten as 

fc-i 



i = J2 Ib =^ ( B1 ) 



i=0 



where h=i is equal to /(a^; /3\b = i). Since the joint probability V only contribute to h=o, 

> 2 h=t 

dV 2 



we only need to calculate d2 Jy2° ■ The reexpression of I b=0 is 



t 'STSTv ra -\h n\i Pr{(3 = n, a = j\b = 0) 

J -o = gg P ^ = n,a =j\b = 0)^ 2 pr(/3 = w|6 = 0)pr(flo=j|6 = 0) . (B2) 

Therefore, the first order partial derivative respecting to Pr(I% — Ag = 0\x — 0, y — 0) is 

dh=o 
dV 

t^t^ dPr(a =j,f3 = n\b = 0) Pr(a = j, (3 = n\b = 0) 

dV ° 52 p r (/3 = n|6 = 0)Pr(a =j|& = 0) 

1 gPr(a =j,/3 = n|5 = 0) Pr(a = j, g = n\b = 0) ,9Pr(/3 = n\b = 0) 
/n2 l ~ dV ~ ~ p r (/3 = n|6 = 0) ~" dV ' [ ' 

We can express Pr(a = j, /? = n|6 = 0) as the combination of joint probabilities Pr(B$ — 

Ag\x,y) and input marginal probabilities Pr(aj) to obtain 9Pr ( a o-i^- ra l b -°) _ 

Since joint probabilities Pr(£>y> — Ag\x,y) are subjected to the normalization conditions of 

total probability, if n — j ^ (d — 1), 

Pr(a =j,/3=n|b=0)=£ a ^ o Pi{B s -A s =n-j\x,y=0) Pr(o =j) n t# Pr(a fc ); (B4) 

if n - j = (d- 1), 

Pr(a =J^=n|b=0)=E„^ (l-Et=o Pr(B^-A £ =t|i,j?=0)) Pr(a =j) n fc ^ Pr(o fc ), (B5) 

where x in the above functions is given by the encoding of the RAC protocol, namely, 
x :— (xi, • • • , Xk-i) with Xj = — a 
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Now, we can calculate the derivatives. The patrial derivative 

<9Pr(a = j, P — n\b = 0) 



(B6) 



dV 

is not equal to zero for two cases, the first one is j = n, we can obtain Uk Pr(afc = n) 
for (B6). The second case is n — j = (d — 1), we can obtain — n fc Pr(a fc = n — (d — 1)). 
Therefore, since Pr(/3 = n\b = 0) = £V Pr(a — j: P — n\b = 0), we can obtain 

dPr(P = n\b = 0) = ^ Fr(ak = n) _ Uk p r(flfc = n _( d _ 1))m (B7 ) 

Put above result to (B3), for fixed j, we can find that J^tll dP ^ a °=^= n \ b =^ = 0, thus the 
second term of (B3) will vanish. 

We then can calculate the second order derivative 

d 2 I b =p l spd-l ^d-l( dPr(a =j,/3=n\b=0) \2 1 

dV 2 ln2 2^n=0 l^j=0 \ dV > Pr(a =j,/3=n\b=0) 

2 dPv(a =j,l3=n\b=0) 9 Pr(/3=n|b=0) . / dPr(/3=n|fc=0) n 2 Pr(a =j,/?=n|fe=0) /no\ 

^ \ dV > (Pr(8=n\b=0)) 2 \ D °> 



Pr(/3=n|fe=0) dV dV 1 V dV ' (Pr(/3=n|6=0)) 2 

For d — 2 and k — 2, (B8) becomes 

° 2 ' 1 : [(Pr(a = 0) Pr( ai = 0)) 2 (— \ - - + 1 



dV 2 ln2 lx vu ' v " v Pr(a = 0, /3 = 0|6 = 0) Pr(a = 0, p = l\b = 0) ' 
+ ( Pr(ao = i) Pr(ai = + 

-< P,(g = o|» = o) + P^T^) )(Pr( "° = 01 Pr(ai = 01 " Pr( °° = 11 Pr(ai = 1))2 ' 

(B9) 

Once Pr(ao = 0) = 1 — Pr(ai = 0), the above function is non-negative. 

For higher d and k, once the input marginal probabilities Pr(aj) are uniform. We then 
can obtain 

d 2 I _ d 2 I b=0 _ 
dV 2 dV 2 

ln2 ^ d 2k ^Pr{a = n,P = n\b = 0) + Pr(a = n, p = n - (d - l)\b = 0) ' 

> (B10) 

It is clear that mutual information I is not a concave function to joint probabilities 
Pi(Bjj— Ag\x,y) and input marginal probabilities Pr(ctj). 
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Appendix C: Semidefinite programming 

In this appendix, we briefly introduce the semidefinite programming (SDP) [23]. SDP is 
the problem of optimizing a linear function subjected to certain conditions associated with 
a positive semidefinite matrix X, i.e., v^Xv > 0, for v e C™, and is denoted by X >z 0. It 
can be formulated as the standard primal problem as follows. Given the n x n symmetric 
matrices C and D q s with q — 1, • • • , m, we like to optimize the n x n positive semidefinite 
matrix X y such that we can achieve the following: 

minimize Ti(C T X) (Cla) 
subject to Tt(D^X) = b q , q = 1, • • • , m . (Clb) 

Corresponding to the above primal problem, we can obtain a dual problem via a Lagrange 
approach [24]. The Lagrange duality can be understood as the following. If the primal 
problem is 



minimize fo(x) (C2a) 

s.t. f q (x) < 0, qe 1...TO. (C2b) 

h q (x) = 0, qe l...p, (C2c) 

the Lagrange function can be defined as 

L(x, A, v) = f (x) + T^ =1 \ q f q {x) + Y? q=1 v q h q (x), (C3) 

where Ai,. . . , A m , and v\,. . . ,u p are Lagrange multipliers respectively. Due to the problem 
and (C3), the minima of fo is bounded by (C3) under the constraints when Ai,. . . , A m > 0. 

inf fo > inf L(x, A, v). 

X X 

Then the Lagrange dual function is obtained. 

g(\, v) = inf L(x, A, v). 

X 

g{\ v) < p (p is the optimal solution of fo(x) ), for Ai,. . . , A m > and arbitrary ui,. . . ,v p . 
The dual problem is defined. 

maximize g(X, v) (C4a) 
s.t. \ q > 0. (qe {l...m}) (C4b) 
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We can use the same method to define the dual problem for SDP. From the primal problem 
of SDP (CI), we can write down the dual function by using minimax inequality [26]. 



MTr(C T X) = mf Tr(C T X) + ~ Tr(^X)) 

q=l 

m m 

= su p E vM + M(c T - £ v«D T q )x) 

- V q=l q=l 

m m 

> sup mf VM + Tr((C T - £ y q D T q )X) 

V ~ q=l q=l 

m m 

= sup mtJ^vM + Tr((C - £y,£> 9 ) T X). (C5) 

y ~ q=l q=l 

The optimal solution of dual function is bounded under some vector y. 

• fV"* (h\^rv((n V"* n^v^ J SU P?/ ££=i vM \wken C - £™ =1 VqD q h 

SUP J£ f n W9) + Tr ^ C ~ 2^ ^^9) = < 

a - q= i q= i I —00 ^otherwise. 

The correspond dual problem is 



maximize 



J2vM (C6a) 



9=1 



si. 5 = C - ^ 2/9 A? b 0. (C6b) 
9=1 

If the feasible solutions for the primal problem and the dual problem attain their minimal 
and maximal values denoted as p' and d! respectively, then p' > a", which is called the duality 
gap. This implies that the optimal solution of primal problem is bounded by dual problem. 
This then leads to the following: Both the primal and the dual problems attain their optimal 
solutions when the duality gap vanishes, i.e., d! = p' . 



Appendix D: The quantum constraints for n = 1 and n = 1 + AB certificate 

We divide this appendix into two parts. In the first part, we will write down the associated 
quantum constraints for and Y^ 1+AB ^ when finding the bound of Bell-type inequality. In 
the second part, we will estimate the number of these constraints and find a efficient way to 
write down these constraints. 
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1. The quantum constraints for n = 1 and n = 1 + AB certificate 

When maximizing the Bell-type inequality under some quantum constraints, the joint 
probabilities are not given, they are variables. Therefore, when writing down quantum 
constraints (4.13b), we only need to consider the elements with the specific value (0 and 
1) and the relation between different elements such as some elements are the same. For 
convenience, instead of A$ and B$, we use a : a G A and b : b G B to denote Alice's and 
Bob's outcomes and X(a) and Y(b) are the associated measurement setting. The indexes 
s, t of T denote associated operators, i.e., T atb = Tr(E a E b p). 

For the associated quantum constraints are 

. rg =Tr(p) = l. 

• C = ^r«ifX(a)=X(a'). 

• C = * w rg if y(a)=y(a'). 
_ r (l) _ r (l) 

We reexpress r( 1+AB ) by 4 sub-matrixes, 

^1,1 t>l,2 
^2,1 ^2,2 

Since r( 1+AB ) is symmetric matrix, the sub-matrix i> 2i i is equal to the transpose of i>i )2 , and 
both sub-matrix v± t i and i> 2i2 are symmetric matrixes. Note that, vi t i = The elements 
of matrices i>i i2 and t> 2)2 are constrained by following quantum constrains: 



• 


r (l+AB) 
1 l,ab 


_ p(l+AB) _ 

a,ab 


p(l+AB) _ p(l+AB) 
a, 6 b,afe 






• 


p(l+AB) 
ab,a'b 


r (l+AB) 
— 1 a,a'b — 


p(l+AB) 






• 


p(l+AB) 
ab,ab' 


r (l+AB) 
— 1 b,ab' ~ 


r (l+AB) 
1 6' ,06 






• 


p(l+AB) 
a,a' 


_ n r (l+AB) 
— u > 1 a,o'6 


- and r (1+AB) - 

- u, ana 1 aba , b , - 


if X(a') = 


-X(a 


• 


r (l+AB) 
1 b,b' 


_ n r (l+AB) 
— u > 1 b,ab> 


- and V (l+AB) - 

- u, ana 1 ab a , b , - 


if Y(b) = 


Y(b>) 


• 


r (l+AB) 
1 s,t 


r (l+AB) 
— 1 t,s 










46 



2. Estimating the number of constrains for n = 1 and n = 1 + AB certificates 

Due to the limitation of computer memory, we need to estimate the number of these 
quantum constraints for different k and d RAC protocols. The dimension of is 1 + (d — 
l)(d k ~ 1 + k), we denote it as dim. The number of conditions corresponding to different 
quantum behaviors is as follows. 





n=l 


symmetric matrix 


Tr{p) = rS = 1 


orthogonality 


E a E a — E a , E b E b — 


E b 






number 


dim(dim-l) 
2 


1 


^f-^id^ + k) 


dim — 1 




The dimension of r^ 1+AB ) is l + (d—l)(d k 1 +k) + (d—l)(d k 1 k), we denote it as dim 
The number of conditions corresponding to different quantum behaviors is as follows. 


l+AB- 


n 


= 1+AB 


symmetric matrix 


Tr(p) = rj fl = 1 


orthogonality 


E a E a — E a , E b E b — 


E b 


same 


number 


dim 1+A B(dim 1+AB -l) 


1 


otha + othb + othc 


dim l+AB - 1 


J2i=i samd 


2 



The quantum constraints orthogonality and commutativity make some elements of cer- 
tificate to be or to be the same. We will specify to estimate the number of these special 
elements in n = 1 + AB certificate. First, we estimate the number of elements whose value 
is zero. 



• The variable otha = ( d 1 K d 2 ^ d k 1 + k) is used to specify the number of zero elements 
for right upper matrix of 

• The variable othb = 2(d~l) 2 (d—2)kd k ~ 1 is used to specify the number of zero elements 
for sub-matrix i>i )2 . 

• The variable othc = ^^^ {{d - 2){d - l)^*" 1 + k - 2) + (d - l) 2 - 1) is used to 
specify the number of zero elements for right upper matrix of 1*2,2 ■ 

We estimate the variable samei which is used to denote the number of equal pairs. 



• 


r (l+AS) 
1 l,ab 


_ r (l+AB) 
1 a,ab ' 


samei = 


(d- 


Ifid^k). 


• 


p(l+AB) 
a,ab 


r (l+AB) 
— 1 a,b ' 


same 2 = 


(d- 


lf{d k ~ l k). 


• 


r (l+AB) 
1 b,ab 


r (l+AB) 
— 1 a,b 1 


same 3 = 


(d- 


lfid^k). 


• 


p(l+AB) 
ab,a'b 


_ r (l+AB) 
1 a,a'b ' 


same^ = 


(d- 


Ifd^kid*- 1 - l)/2 
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• ri;r B) = rs:r\ ^ 5 = < d - i^- 1 ^- 1 - 1). 

• r<£f> = »e 6 = (d - i)3d*-ifc(fc - 1)/2. 

• = r?^, S ame 7 = (d - Yfd k ~ l k(k - 1). 

After estimating the number of conditions, we can think how to write down these condi- 
tions with minimal computer memory. Here, we use the numerical package named CVXOPT 
[20] to calculate the bound of Bell-type inequality. The primal problem of the cone program- 
ming defined in CVXOPT is 



minimize c ■ x 
subject to Ax — 6 = 
h - Gx > 



(D2a) 
(D2b) 
(D2c) 



Given c, h which are the vectors and A, G which are matrixes, we can optimize the linear 
combination c • x. Here matrix G is used to specify the positive definiteness constraint. 
Writing down the positive definiteness constraint of a matrix Z whose size is s x s, we need 
the matrix G with size s 2 x n to define the condition (where n is the number of variables x). 
That means, if we reduce the number of variables, we can save the computer memory. To 
do this, we define the same variable for two elements instead of constraining two variables 
with the same value. On the other hand, if the value of some elements are zero, it could 
also reduce the number of variables. 

After using the conditions to reduce the number of variables, we can estimate the number 
of variables in the certificate. 

The number of variables in I^ 1 ) for different RAC protocols: 



n=l 


d=2 


d=3 


d=4 


d=5 


k=2 


10 


50 


153 


364 


k=3 


28 


288 


1596 


6160 


k=4 


78 


1922 


20706 


132612 



The number of variables in Y^ l+AB ^ for different RAC protocols: 
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n=l+AB 


d=2 


d=3 


d=4 


d=5 


k=2 


15 


182 


1287 


5964 


k=3 


82 


4068 


61860 


474160 


k=4 


486 


71258 


1995810 


24012612 



Due to the constraint of the computer memory (128GB), we could not find the bound of 
Bell-type inequality for arbitrary RAC communication protocols. We find the bound what 
we can do and show the result in the main text. 
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